Analysis of Multi-Scale Fractal Dimension to Classify Human Motion 
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In recent years there has been considerable interest in human action recognition. Several approaches have 
been developed in order to enhance the automatic video analysis. Although some developments have been 
achieved by the computer vision community, the properly classification of human motion is still a hard and 
challenging task. The objective of this study is to investigate the use of 3D multi- scale fractal dimension to 
recognize motion patterns in videos. In order to develop a robust strategy for human motion classification, we 
proposed a method where the Fourier transform is used to calculate the derivative in which all data points are 
deemed. Our results shown that different accuracy rates can be found for different databases. We believe that 
in specific applications our results are the first step to develop an automatic monitoring system, which can be 
applied in security systems, traffic monitoring, biology, physical therapy, cardiovascular disease among many 
others. 



I. INTRODUCTION 

In recent years, there has been a growth of research activity 
aimed to develop human motion classifiers in order to enhance 
the automatic video analysis. Several approaches for track- 
ing movement have been proposed in the literature. Basically, 
they differ in the type of object representation, varying size, 
position and shape of moving objects, in applying the type of 
motion and appearance model. 

All these perspectives are designated according to the con- 
text and end-use monitoring will be conducted. Regarding 
the context in which the movement can be recognized, diffe- 
rent and interesting applications had been considered. For in- 
stance, human-computer interface, gesture recognition, video 
indexing and browsing, analysis of sports events and video 
surveillance. In these situations, to recognize events of par- 
ticular interest as well as their complexity and to make infer- 
ences about their evolution play a crucial rule in image pro- 
cessing research. All these tasks can be done by considering 
the knowledge that can be obtained from the motion patterns. 

Although developments have been achieved by the com- 
puter vision community, the properly classification of human 
motion is still a hard and challenging task. This because (i) it 
is not possible to control the acquisition of images sequence; 
(ii) the images can suffer from poor illumination, blur, occlu- 
sion or of several other possibilities. Moreover, in real world, 
the situations differs a lot from the controlled conditions tested 
at the laboratory. 

An appropriate method is required to assess the motion in 
different videos. In this study we focus on classification of 
the following single human motions: "walk", "skip", "kick", 
"playing basketball", "run", "jack", "jump", side and "wave" 
under unconstrained indoor environments. The purpose of this 
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paper is to investigate the use of 3D multi- scale fractal dimen- 
sion to recognize motion patterns in videos. In order to study 
and develop a more robust strategy for human motion classi- 
fication, we are using the Fourier Transform to calculate the 
derivative in which all data points are deemed instead of nu- 
merical methods. 

Each motion class is characterized by a signature obtained 
by multi-scale fractal dimension-based approach. Different 
motions will provide distinct signatures, therefore we can dis- 
cern dissimilar motions. The multi-scale fractal dimension 
was performed by using the Bouligand-Minkowski method, 
a robust, accurate and consistent way to estimate the fractal 
dimension according to literature [QJ[3]|. The motion signa- 
ture is a curve of multi-scale fractal dimension that represents 
the changes in shape complexity frame by frame for different 
scales observed. 

The rest of paper is organized as follows. Extraction signa- 
tures by multi-scale fractal dimension is explained in Section 
[n| Experimental results and discussions are presented in Sec- 
tion HU and conclusions in Section HV1 



II. SIGNATURES BY MULTI-SCALE FRACTAL 
DIMENSION 

Fractal analysis has been widely applied to describe diffe- 
rent problems in pattern recognition, image processing and 
many others domains. Established by Benoit Mandelbrot the 
fractal geometry is useful in problems that require complexity 
analysis of structures across different scales (4). It is neces- 
sary to point out that the metric properties of fractal objects 
are a function of the scale used to perform the measurement. 
Then, we can describe an object with fractional values de- 
picting the level of complexity and spatial distribution in the 
image EE-El 
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A. Fractal Dimension 

The fractal dimension indicates how much space is occu- 
pied by the object, representing the degree of complexity that 
the figure has. For uniform and compact objects the fractal di- 
mension coincides with the topological dimension, however, 
for fractal objects it is a fractional value. To estimate the frac- 
tal dimension of an object several methods can be used, in- 
cluding box-counting, mass-radius, dividers and Bouligand- 
Minkowski approach. For this study we take the Bouligand- 
Minkowski method into account due the precision and adap- 
tation to the multi-scale approach El El O- Usually Min- 
kowski is used for many applications in shape and texture 
analysis. While in shape, the Minkowski approach consid- 
ers two-dimensional space, for texture we usually handle with 
three-dimensional space, due the third coordinate corresponds 
to the intensity of gray level at each point. In our case, we also 
have a three-dimensional space where the third coordinate is 
the time, i.e., the sequence of images in which actions occur 
during the video. 

By using Bouligand-Minkowski we analyze the relation- 
ship between the object and the space occupied by it in space. 
The fractal dimension is obtained by calculating the volume 
of the dilated object. The dilatation can be performed by con- 
sidering a sphere of radius r (Figure [3] a)), which is centred at 
each point of the original object and all other points inside the 
sphere are joined to the object. In order to generate a signature 
for the shape, we analyzed the object volume as a function of 
r. The algorithm used to perform this task consists in use the 
exact distance transform (EDT) fT0lfT3lL which is the distance 
of all points of the image to the closest point of the object. Af- 
ter that, we computed the fractal dimension by analyzing the 
log-log curve of the volume of influence versus r (Figure [I]). 
The fractal dimension is defined as: 



FD(r) = N 



log A(r) 
logr 



(1) 



where A(r) is the influence volume of the object with radius 
r and N is the number of dimensions. 

Over the video frames it is possible find different visions 
of the same object. In a image of the video sequence the mo- 
tion object can be close to the camera while in another im- 
age/frame can be far from the camera. Because of the range 
of scales of the object during the image sequence and for have 
just a value for the whole video, we use an extension of fractal 
dimension that is Multi-Scale Fractal Dimension. 




log r 



FIG. 1 . Log-log curve of a sequence of video images generated by 
Bouligand-Minkowski. 



complexity of the object at different scales 0H4). Multi-scale 
fractal dimension curve is defined through the derivative of 
log V(r) x logr curve. For computing the influence volume 
we use the Boulingand-Minkowski method for three dimen- 
sions and we use the derivative property of Fourier transform 
to calculate the derivatives. 

In the proposed method, each image / G R 2 of the se- 
quence of images (video) is considered as a surface S G R 3 . 
Each pixel of the image is converted to a point p = (x,y,z), 
p G S, where x and y are the coordinates of the object in the 
image n x n, and z is the frame in the sequence of t images, 
as shown in Figure [2] 




FIG. 2. Temporal variation in the sequence of video images. 

The volume of the dilated shape in time V(r) (Figure |3]b)) 
using a sphere of radius r can be written as: 



B. Volumetric Multi-Scale Fractal Dimension 

The log-log curve produced by the Bouligand-Minkowski 
method presents a wealth of detail that can not be represented 
by only one value provided by the fractal dimension. For this 
reason the multi- scale fractal dimension uses the derivative to 
explore the limit of infinitesimal linear interpolation. Thus it 
is possible to obtain the relationship between variations in the 



N 

V(r) = J2®(r,n), (2) 

i=0 

where T{ is the minimum distance from a point i to any other 
point p belonging to the object, N is n x n x t and 6(r, ri) 
is the Heaviside function 031 that returns 1 if r > Ti and 0, 
otherwise. 
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a b 

FIG. 3. Dilation of shape, a) 2D and b) in time. Red indicates radius 
r = \/50, orange indicates r = vTOO, yellow with r = v / 200 and 
blue r = V300. 



also use a convolution of the original signal with a Gaussian 
kernel in order to smooth the derivative. 

Two questions have to be observed when calculating the 
derivative. The first one is the spacing between points of the 
signal, because the log-log curve has a very low sampling at 
the beginning, as showed in Figure [5] The sparse points were 
ignored and the remaining points were interpolated by filling 
spaces between each two points with their average. The se- 
cond question is that the Fourier transform does not converge 
uniformly in discontinuities causing the so-called Gibbs phe- 
nomenon at the ends of the signal (Figure [6]). To solve this 
problem, the curve was replicated before and after the origi- 
nal curve (Figure [7]). 



We compute the Bouligand-Minkowski fractal dimension 
Das: 



D 



lim 



log V{r) 
logr 



(3) 



As it has been considered three-dimensional space, D is 
within [0, 3] and 3 is number of dimensions. According to the 
radius r, the volume of a sphere produced by a point p G S 
affects the volume of other spheres, disturbing the way the 
volume of influence increases. This makes the volume of in- 
fluence V(r) very sensitive to structural changes fl6l . 

Thereafter, the Multi-scale fractal dimension were taken us- 
ing: 



MFD 



d\og V(r) 
dlog(r) 



(4) 



The result is a curve with the fractal dimension calculated at 
each spatial scale represented by the radius r (Figure]?]). The 
curves of the multi- scale fractal dimension were considered as 
signatures for the study of the motion patterns. 
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FIG. 4. Multi-scale Fractal Dimension of a sequence of video ima- 
ges. 

To calculate the derivative we use a property of Fourier 
transform, which allow us to obtain the derivative of any func- 
tion from the analysis of its spectrum of frequencies ifTTl . We 



0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 



log r 



FIG. 5. Very sparse points are ignored. 
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FIG. 6. Fourier transform does not converge uniformly in disconti- 
nuities causing the so-called Gibbs phenomenon. 



III. RESULTS AND DISCUSSIONS 

The analysis of motion is still a challenging problem in the 
field of image processing and pattern recognition. Over the 
years, several approaches using different constructs have been 
proposed for action recognition, including: machine learn- 
ing EE [El, optical flow ll20U22l appearance model | 
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FIG. 7. Curve replication before and after the original curve to solve 
the Gibbs phenomenon. 



They differ especially in the kind of object representation, im- 
age features, and in applying the type of motion. 

In this paper we have studied a new strategy to character- 
ize motion in a image sequence, as well as to recognize the 
motion patterns in order to classify the movement. First, we 
use only the fractal dimension in the three-dimensional space 
to characterize the motion. However, it was unable to capture 
structural differences in the shape of the moving object. An- 
other problem is that the method was not scale-invariant. To 
overcome this problem, we investigate a shape in time over 
multiple scales. For each scale we have a fractal dimension 
and the set of consecutive fractal dimensions is called signa- 
ture of the video sequences. The main goal of this paper is to 
use these signatures in order to discriminate different types of 
movement. 

To confirm the hypothesis that motion signatures can be 
distinct for different movements, we calculate the multi- scale 
fractal dimension to all actions. We compared the similarity of 
the same type of movement to ensure that signatures could be 
obtained similar to movements of the same class and different 
signatures for movements belonging to different classes. Fi- 
gure [8] shows an example of four different motion signatures. 
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FIG. 8. Motion signature by Multi-scale Fractal Dimension of four 
different movements. 



Basically, there are two parameters in our approach, a and 
Tmax- The first one is related to the standard deviation of the 
Gaussian kernel and it quantifies the level of smoothness of 
the signatures. The last one is related to the maximum scale 
that is used to analyze an image sequence. We investigate the 
effects of this two parameters and we conclude that for very 
high values of a, we have a strongly smooth curves which 
implies loss of important details of the signature. We tested 
a varying from 1 until 6 and the maximum radius equals 160. 
Regarding the parameter r max , to find out the optimal value 
becomes difficulty, because it is not known a priori at what 
scale the motion must be analyzed. 

Our strategy was demonstrated performing tests 
on two publicly available dataset, CMU Graph- 
ics Lab Motion Capture Database (available at 
http://mocap.cs.cmu.edu/) and Weizmann Human Action 
Dataset (26l [27l (http://www.wisdom.weizmann.ac.il/ vi- 
sion/SpaceTime Actions, html) . 

A signature was then generated for each video and we clas- 
sify the motions according to their signatures. The result cor- 
responds to a feature vector based on the signature of the 
multi-scale fractal dimension and each point of the signature 
will be an attribute. Support vector machine with 10-fold 
cross validation scheme was chosen to classify the examples. 
SVM uses the Statistical Learning Theory, first assuming the 
data domain in which learning is occurring are generated inde- 
pendently and identically distributed according to a probabil- 
ity distribution relationship between the examples and classes 
(28). Thus, to new data from the same domain, SVM obtains 
good results. The result of 10-fold cross validation is the aver- 
age values for 10 runs of the same experiment through random 
selection of actions in the database. 

We accomplished two sets of experiments. The first one on 
Mocap database. And the second one on Weizmann Database. 
All experiments were performed using Weka |29l with default 
values parameters. 



A. Mocap Database 

The Motion Capture Database contains 2605 trials in six 
motion categories and 23 subcategories. We choose 66 video 
sequences showing eight different subjects which perform 
four distinct actions at varying speeds. The actions are: 
"walk", "skip", "kick" and "playing basketball". The cam- 
eras are placed around a rectangular area, of approximately 3 
m x 8 m, in the center of the room. Only motions that take 
place in this rectangle can be captured. The Mocap database 
contains videos with 53 a 1020 images of size 240 x 320. 

To find the best values for r max we performed tests varying 
the radius from 10 to 160 and a equals 1. The number of 
correctly classified instances is almost constant by varying the 
size of the radius, this because of the size of each video image. 
Then, we can use the lowest. 

To evaluate the better a value we varied it from 1 until 
6. According to which the a value increases the number of 
correctly classified instances decreases, as the high smooth of 
motion signature. 
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From the performed tests we conclude the better values are 
^max equals 10 and a equals 1. It was obtained 90.91 % (stan- 
dard deviation equals 0.34) of accuracy with 60 of 66 correctly 
classified instances. 



B. Weizmann Database 

The Weizmann human action dataset has 93 video se- 
quences showing nine different subjects, each performing 10 
natural actions such as "run", "walk", "skip", "jumping-jack" 
(or shortly "jack"), "jump-forward-on-two-legs" (or "jump"), 
"jump-in-place-on-two-legs" (or "pjump"), "gallopsideways" 
(or "side"), "wave-two-hands" and "wave-one- hand" (or 
"wave"), or "bend". The Weizmann database contains videos 
with 28 a 146 images of size 144 x 180. The size of video are 
not correlated to the motion. 

We tested different values for r max and a. There is little 
variation in the number of correct classifications by varying 
the size of the radius, but we note that with a r max equal to 
110, has the highest rating. With r max equals 110, we vary 
the value of a from 1 to 6. All combinations of radius and 
sigma have been tested, but only the most relevant values are 
shown here. Again, with increase of a, the correctly classi- 
fied decreases, the better result is obtained with a equals 2. 
Therefore, with radius equals 110 and a equals 2, the result 
of the classification of Weizmann database is 79.57 % (stan- 
dard deviation equals 0.28) of accuracy with 74 of 93 correctly 
classified instances. 



IV. CONCLUSIONS 

This paper presented a study of motion classification based 
on a frame-by-frame analysis of the complexity of a shape in 



a video. Our main goal was apply the so-called multi- scale 
fractal dimension in order to classify videos according their 
content. We developed a strategy to classify human motion 
using multi- scale fractal dimension that consists to represent 
the movement contained in a video by a signature and support 
vector machine to classify. We have applied the method we 
describe in two real databases. The first one with 66 videos 
and four different types of motion and the second one with 93 
videos and ten types of movements, where we have obtained 
different results with 90.91 % and 79.57 % of accuracy, re- 
spectively. The first database has only four motion classes 
quite different, however the second one has ten classes with 
some similar as the case of "run", "side" an "skip". In fact, it 
will be interesting to perform new experiments in a larger data 
set with the intention of finding a general strategy that can be 
potentially applied in a wide variety of vision problems that 
involve various complex structures of motion. 
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